Skip to content

[Misc] Add gemma3 chat template with pythonic-style function calling #17149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

philipchung
Copy link

@philipchung philipchung commented Apr 25, 2025

This PR adds a Jinja2 chat prompt template for Gemma-3 for generating tool calls in pythonic format and is compatible with the existing vLLM pythonic tool call parser that extracts the tool calls and formats them into the tool_calls field for ChatCompletion responses as a ChatCompletionMessageToolCall.

The template is a combination of contributions from @jstangroome and @philipchung.

vllm serve models/gemma-3-27b-it  --enable-auto-tool-choice  --tool-call-parser pythonic --chat-template tool_chat_template_gemma3_pythonic.jinja

FIX #14734

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation tool-calling labels Apr 25, 2025
@paolovic
Copy link
Contributor

@philipchung please, sign the DCO, e.g., git commit -m "bla" -s

@philipchung
Copy link
Author

@paolovic I've signed the DCO now.

@gyin94
Copy link

gyin94 commented Apr 26, 2025

thanks for adding it. but this caused hanging when tool parser failed for some results. especially 4b

@paolovic
Copy link
Contributor

thanks for adding it. but this caused hanging when tool parser failed for some results. especially 4b

Hi,
which precise version of the model and vllm are you using?

Copy link
Contributor

@Zerohertz Zerohertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

I've tested the function calling flow with the changes in this PR, and it appears to be working correctly. The model successfully identified the need for a function call based on the user prompt and the provided tool definition, extracted the necessary arguments, and then generated an appropriate final response after receiving the function's result.

Here's a summary of the test case and results:


<bos><start_of_turn>user
Tools (functions) are available. If you decide to invoke one or more of the tools, you must respond with a python list of the function calls.
Example Format: [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] 
Do not use variables. DO NOT USE MARKDOWN SYNTAX. You SHOULD NOT include any other text in the response if you call a function. If none of the functions can be used, point it out. If you lack the parameters required by the function, also point it out.
Here is a list of functions in JSON format that you can invoke.
[
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a specified location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": [
                            "celsius",
                            "fahrenheit"
                        ],
                        "description": "The unit of temperature"
                    }
                },
                "required": [
                    "location"
                ]
            }
        }
    }
]

How's the weather in Seoul?<end_of_turn>
<start_of_turn>model
<bos><start_of_turn>user
How's the weather in Seoul?<end_of_turn>
<start_of_turn>model
[get_current_weather(location="Seoul"unit="celsius")]<end_of_turn>
<start_of_turn>user
<tool_response>
{"location": "Seoul", "temperature": 22, "unit": "celsius", "forecast": ["sunny", "windy"], "humidity": 60}</tool_response><end_of_turn>
<start_of_turn>model

2025-05-02 18:15:47.532 | INFO     | __main__:test_function_call:87 - Messages: [
  {
    "role": "user",
    "content": "How's the weather in Seoul?"
  }
]
2025-05-02 18:15:47.532 | INFO     | __main__:test_function_call:88 - Tools: [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a specified location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g., San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": [
              "celsius",
              "fahrenheit"
            ],
            "description": "The unit of temperature"
          }
        },
        "required": [
          "location"
        ]
      }
    }
  }
]
2025-05-02 18:15:47.961 | WARNING  | __main__:test_function_call:96 - Tool call detected!
2025-05-02 18:15:47.961 | DEBUG    | __main__:test_function_call:101 - Function: get_current_weather
2025-05-02 18:15:47.961 | DEBUG    | __main__:test_function_call:102 - Arguments: {'location': 'Seoul', 'unit': 'celsius'}
2025-05-02 18:15:47.961 | DEBUG    | __main__:test_function_call:105 - Function result: {'location': 'Seoul', 'temperature': 22, 'unit': 'celsius', 'forecast': ['sunny', 'windy'], 'humidity': 60}
2025-05-02 18:15:47.962 | INFO     | __main__:test_function_call:117 - Continuing conversation with function result...
2025-05-02 18:15:48.474 | INFO     | __main__:test_function_call:119 - 
Final AI response: The weather in Seoul is currently 22°C. It's sunny and windy with 60% humidity. 

@RomaricLocuta
Copy link

Hi, thank you for your contribution!
I tested it on Gemma-27B-IT, and it works fine in many cases ! Every tool call includes all the required arguments.

However, I’ve observed on several occasions that the LLM starts with a text reply and only calls a tool at the very end—even though the prompt explicitly and clearly tells it to invoke one of the tools.

Example :

US GDP in the fourth quarter of 2024 was an annual rate of 2.4 percent, equivalent to $28.25 trillion. New York state GDP in the fourth quarter of 2024 was $2,346,932 (in current dollars). New York state GDP represents approximately 8.31% of the US GDP.

[transfer_to_math_agent()]  

It seems that the instruction “You SHOULD NOT include any other text in the response if you call a function” is not being followed by Gemma. Do you know whether vLLM can be configured to let the model return both text and a tool invocation in the same response?

@maziyarpanahi
Copy link

Any updates on when this is going to be merged? we can improve it once more people get to use it and share feedbacks.


{#- Insert system message content (if present) at the beginning of the first message. -#}
{%- if loop.first -%}
{{ first_user_prefix }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{ first_user_prefix }}
{#- Append system message with tool information if using tools in message request. -#}
{%- if tools is not none -%}
{{- "Tools (functions) are available. If you decide to invoke one or more of the tools, you must respond with a python list of the function calls.\n" -}}
{{- "Example Format: [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] \n" -}}
{{- "Do not use variables. DO NOT USE MARKDOWN SYNTAX. You SHOULD NOT include any other text in the response if you call a function. If none of the functions can be used, point it out. If you lack the parameters required by the function, also point it out.\n" -}}
{{- "Here is a list of functions in JSON format that you can invoke.\n" -}}
{{- tools | tojson(indent=4) -}}
{{- "\n\n" -}}
{%- endif -%}
{{ first_user_prefix }}

Gemma-3 does not handle tool definitions being added after the system prompt properly. When it is switched around it seems to be ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot for the life of me get Gemma3 to consistently place the tool call at the beginning of the response. Despite reiterating that it needs to be there within the system prompt and this template containing the same, more often than not, I see Gemma place the tool call at the end of the response.

Is this a limitation of the template? Can I not have it detect a tool call anywhere in the response? I would really love to continue using Gemma3 but we might have to switch if we can't get more robust with the function calling support.

@DavidCatalano
Copy link

I was unable to get the proposed chat template to work with vLLM

Failed attempts:

  • gemma-3-4b-it (BF16 Transformers)
  • gemma-3-12b-it (BF16 Transformers)
  • gemma-3-27b-it (w8a8 Transformers)

I utilized the chat template changes proposed by @vriesdemichael
I'll be monitoring to see if others can achieve success.

vllm  | INFO 06-29 13:32:31 [logger.py:43] Received request chatcmpl-956be9e88bc34fbf9fbf0ddbdd1b8dcd: prompt: '<bos><start_of_turn>user\nCurrent model: vllm-gemma-3-4b-it\nCurrent date: 2025-06-29\nYou are a helpful assistant with access to tools. You are capable of returning zero text if a tool is used. If you determine you should use a tool you must ONLY call the tool and not output other text. For example, if you should use a search tool do not output text other than the tool cal.\n\nYou have access to the following tools to help respond to the user. To call tools, please respond with a python list of the calls. DO NOT USE MARKDOWN SYNTAX.\nRespond in the format [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] \nDo not use variables.\n\n{\n    "type": "function",\n    "function": {\n        "name": "mcp__tavily_search__tavily-crawl",\n        "description": "A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.",\n        "parameters": {\n            "type": "object",\n            "properties": {\n                "allow_external": {\n                    "default": false,\n                    "description": "Whether to allow following links that go to external domains",\n                    "type": "boolean"\n                },\n                "categories": {\n                    "default": [],
...TRUNCATED...
            "additionalProperties": false\n        }\n    }\n}\n\nYou have access to search tools, make a tool call immediately when needed. My prompt begins now: Search for the latest news on the budget deficit.<end_of_turn>\n<start_of_turn>model\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.01, top_p=1.0, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=38105, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None, prompt_adapter_request: None.
vllm  | INFO:     10.0.0.254:53992 - "POST /v1/chat/completions HTTP/1.1" 200 OK
vllm  | INFO 06-29 13:32:31 [async_llm.py:271] Added request chatcmpl-956be9e88bc34fbf9fbf0ddbdd1b8dcd.
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184] Error trying to handle streaming tool call.
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184] Traceback (most recent call last):
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py", line 128, in extract_tool_calls_streaming
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184]     raise _UnexpectedAstError(
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184] 
...
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184]     raise _UnexpectedAstError(
vllm  | ERROR 06-29 13:32:32 [pythonic_tool_parser.py:184] vllm.entrypoints.openai.tool_parsers.pythonic_tool_parser._UnexpectedAstError: Tool output must be a list of function calls

@sjzy23
Copy link

sjzy23 commented Jul 4, 2025

Hi~ When I try this chat_template, the tool call sometimes returns the markdown syntax.

{
content: "```tool_code
[get_weather()]
```"
additional_kwargs: {
}
response_metadata: {
finish_reason: "stop"
model_name: "gemma-3-27b-it"
}
type: "ai"
name: "supervisor"
id: "run--2f54af5d-8b3d-42d5-8b86-de66aea322cd"
example: false
tool_calls: [
]
invalid_tool_calls: [
]
usage_metadata: null
}

@LiuYuWei
Copy link

Is anyone can help this chat template. I hope I can use vllm + Google ADK + Gemma 3 as our AI Agent Service. But this template cannot use in ADK Service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation tool-calling
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Usage]: Tool calling for gemma-3